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Abstract 

Background: Understanding the genetic basis of adaptive evolution is one of tlie nnajor goals in evolutionary 
biology. Recently, it has been revealed that gene copy number variations (GCNVs) constitute significant proportions 
of genonnic diversities within natural populations. However, it has been unclear whether GCNVs are under positive 
selection and contribute to adaptive evolution. Parallel evolution refers to adaptive evolution of the same trait in 
related but independent lineages, and three-spined stickleback (Gasterosteus aculeatus) is a well-known model 
organism. Through identification of genetic variations under parallel selection, i.e., variations shared among related 
but independent lineages, evidence of positive selection is obtained. In this study, we investigated whole-genome 
resequencing data from the marine and freshwater groups of three-spined sticklebacks from diverse areas along 
the Pacific and Atlantic Ocean coastlines, and searched for GCNVs under parallel selection. 

Results: We identified 24 GCNVs that showed significant differences in the numbers of mapped reads between the 
two groups, and this number was significantly larger than that expected by chance. The derived group, i.e., 
freshwater group, was typically characterized by larger gene-copy numbers, which implied that gene duplications 
or multiplications helped with adaptation to the freshwater environment. Some of the identified GCNVs were those 
of multigenic family genes, which is consistent with the theory that fatal effects due to copy-number changes of 
multigenic family genes tend to be less than those of single-copy genes. 

Conclusion: The identification of GCNVs that were likely under parallel selection suggests that contribution of 
GCNVs should be considered in studies on adaptive evolution. 
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Background 

Understanding the genetic basis of adaptive evolution is 
one of the major goals in evolutionary biology [1-5]. 
When populations adapt to new environments, positive se- 
lection can increase frequencies of specific genetic varia- 
tions that have greater fitness than others, sometimes 
resulting in the fixation of those variations [1-3]. To detect 
positive selection, two major approaches have achieved sig- 
nificant success. One approach is molecular evolutionary 
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analysis of protein-coding gene sequences. Comparison of 
the synonymous and nonsynonymous nucleotide substitu- 
tion rates has been adopted by many studies to identify 
positive selection [1,6]. While this approach is applicable to 
only protein-coding genes that have accumulated sufficient 
numbers of nucleotide substitutions, the other approach 
targets shorter time-scale events by detecting the fixation of 
single nucleotide variations (SNVs) within populations [1]. 
Many SNVs were found to be associated with phenotypic 
variations, including c/5-elemental SNVs that affect gene ex- 
pression levels (e.g., [7]). Analyses of polymorphism distri- 
butions have revealed positive selection of a number of 
SNVs (e.g., [8,9]). 

These approaches focused on positive selection on varia- 
tions due to nucleotide substitutions. However, it has re- 
cently been revealed that copy number variations (CNVs), 
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or gains or losses of DNA segments, constitute a significant 
proportion of genomic diversity [10-15]. Because CNVs are 
known to result in significant phenotypic effects that in- 
clude human diseases [16], they are also expected to be 
under positive selection. In particular, gene copy number 
variations (GCNVs), which change the numbers of gene 
loci in genomes, can significantly alter gene fiinctions and 
dosages [17,18]. As expected, the possibility of fixation of 
CNVs by positive selection has been reported in several 
phylogenetic groups [19,20]. 

Parallel evolution, which is the adaptive evolution of 
the same trait in related but independent lineages, can 
provide evidence of positive selection, because genetic 
drift is unlikely to produce concerted changes in inde- 
pendent lineages [21]. The marine and freshwater pheno- 
types of three-spined sticklebacks {Gasterosteus aculeatus) 
are an excellent system to investigate parallel evolution 
[21]. This species inhabits a large number of marine, estu- 
arine, and freshwater environments in Asia, Europe, and 
North America. After the retreat of Pleistocene glaciers, 
the marine ancestors have colonized and adapted to newly 
created freshwater habitats over the world, showing re- 
peated changes in the body shape, skeletal armor, trophic 
specialization, pigmentation, salt handling, life history, and 
mating preference [22,23]. Previous studies revealed that 
this independent evolution of similar phenotypes in the 
freshwater groups occurred due to parallel selection on 
the globally shared, standing SNVs in the same genes in 
different freshwater populations, providing strong evi- 
dence that positive selection on these SNVs contributed to 
the adaptive evolution toward the freshwater environ- 
ments [24-26]. Recently, Feulner et al. [27] reported a sig- 
nificant number of CNVs in a marine population of the 
sticklebacks. Therefore, as with SNVs, GCNVs can also be 
under parallel selection through the evolution of stickle- 
backs. To investigate this possibility, we analyzed whole- 
genome resequencing data from marine and freshwater 
groups of three-spined sticklebacks and searched for 
GCNVs that contributed to the parallel evolution of the 
three-spined sticklebacks. 

Results and discussion 

GCNVs that likely contributed to the parallel evolution of 
three-spined sticklebacks 

We downloaded whole-genome resequencing data of 10 
marine and 10 freshwater individuals of three-spined 
sticklebacks (Jones et al. [26]) from NCBI Sequence 
Read Archive (SRA, [28]). Both groups consisted of in- 
dividuals that were derived from diverse areas along the 
Pacific and Atlantic Ocean coastlines (Additional file 1: 
Table SI). Thus, genetic variations that were specifically 
shared among individuals in the freshwater (and marine) 
group were likely due to parallel selection. To increase the 
sensitivity of detecting GCNVs under parallel selection. 



we devised a novel approach that was based on a statistical 
method (Figures lA and IB). The sequenced reads from 
each of the 20 individuals were mapped to the reference 
stickleback genome, and the numbers of the mapped reads 
were counted for each gene to estimate changes in their 
copy numbers. Genes that showed significant differences 
in the numbers of mapped reads between both groups 
were identified as GCNVs likely under parallel selection 
(Figures lA and IB; See Methods). 

Twenty-four genes showed significant differences in the 
numbers of mapped reads between both groups (Figure 2 
and Table 1). Among these genes, five showed more cop- 
ies in the individuals of the marine group (freshwater-de- 
creased GCNVs) and 19 showed more copies in those of 
the freshwater group (freshwater-increased GCNVs). We 
confirmed that the number of the identified GCNVs was 
significantly larger than that expected by chance based on 
a permutation test (/? < 0.05) for each mapping option. 
Collectively, these results suggested that the 24 GCNVs 
were likely due to parallel selection. Note that the 2.3 x 
coverage of the resequencing data [26] would have led to 
underestimation of the numbers of GCNVs between the 
marine and freshwater groups. A higher sequencing cover- 
age may result in detection of more GCNVs. 

Among the identified GCNVs, neurexophilin and PC- 
esterase domain family member 3 {NXPE3) overlapped 
with a region that was reported as a CNV in a marine 
group of three-spined sticklebacks [27]. In addition, the 
identified GCNVs included well-known multigenic fam- 
ilies such as sulfotransferase {SUIT), NOD -like receptor 
{NLR), apolipoprotein L {APOL), kinesin family {KIF), 
and myosin heavy chain {MyHC), The finding that the 
identified GCNVs included genes in multigenic families 
was consistent with the idea that GCNVs of multigenic 
family genes are more likely to occur than those of single- 
copy genes. This is because, fatal effects due to copy- 
number changes of multigenic family genes tend to be less 
than those of single-copy genes [29]. It would be notable 
that GCNVs were previously observed for APOL [30], KIF 
[31] and SUIT [32] in primates and for MyHC in fish [33]. 

Segmental duplications/multiplications or deletions 
behind the identified GCNVs 

An important characteristic of the 24 GCNVs likely 
under parallel selection was that they frequently ap- 
peared at close locations on the genomes (Figure 2). 
This observation implied that those GCNVs would have 
resulted from segmental duplications/multiplications or 
deletions of genomic regions that contained multiple 
genes (i.e., gene clusters). Figure 3 represents the ratios 
of the numbers of reads that were mapped to genes in 
and around the gene clusters in the linkage groups VIII 
and XIX, which were suspected to have experienced 
segmental duplications or deletions. This observation 
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Figure 1 Schematic diagram of the method for identifying GCNVs likely under parallel selection. (A) Re-sequenced reads (thin lines) from 
eacli individual were mapped to the stickleback reference genome (thick lines). (B) The numbers of mapped reads that overlapped with genes 
were counted, and we searched for genes that showed significant differences in the normalized read numbers between the freshwater (closed 
circles) and marine groups (open circles) with a false discovery rate (FDR) < 0.05. Genes that showed significant differences under the three 
mapping options were regarded as GCNVs likely under parallel selection. (C) The number of different allelic sequences was counted for each of 
the identified GCNVs by enumerating every pair of SNV positions that was located within the read length. If three or more allelic sequences were 
observed for a gene, the GCNV involved duplications or multiplications. 



was consistent with a previous study that reported that 
CNVs sometimes involve segmental duplications [20] . 

Next, we compared the locations of the 24 GCNVs 
with divergent regions that were designated by Jones 
et al [26], because a previous study reported that many 
CNVs in primates overlapped with genes under positive 



selection [34]. The divergent regions were three-spined 
stickleback genomic regions whose sequences showed 
signs of parallel evolution of nucleotide variations be- 
tween the marine and freshwater groups. The aforemen- 
tioned gene cluster in the linkage group XIX overlapped 
with the divergent regions, suggesting that both nucleotide 
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Figure 2 GCNVs likely under parallel selection. The normalized numbers of mapped reads per 1-Mb gene length for each gene across the 
genomes of the (A) freshwater and (B) marine groups. Each black point represents the number for each gene in each individual, and the green 
lines represent the mean values for each gene across individuals. (C) The false discovery rate of the EdgeR analysis on the differences in the 
numbers of mapped reads between the freshwater and marine groups for each gene. Asterisks indicate the positions of the GCNVs under parallel 
selection (FDR < 0.05). 



sequences and copy numbers of the genes in this region 
would have been under parallel selection during adapta- 
tion to the freshwater environment. However, most of the 
GCNVs did not overlap with the divergent regions, which 
suggested that their copy numbers, but not sequences, 
would have been under parallel selection (Table 1). 



Larger gene copy numbers in the derivative, freshwater 
phenotype 

Among the 24 GCNVs likely under parallel selection, lar- 
ger gene copy numbers were more frequently associated 
with the freshwater group (19 out of 24, Table 1). This was 
consistent with the fact that the freshwater phenotype is 



Table 1 Gene copy number variations likely under parallel selection 
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*Gene annotations were based on BlastX search if EnsembI annotations were unavailable. 
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Figure 3 Segmental duplications/multiplications or deletions underlying the clusters of GCNVs likely under parallel selection. 

Gene clusters that included GCNVs likely under parallel selection located in the linkage groups (A) VIII and (B) XIX are shown with three genes 
upstream or downstream. Each point represents the ratio of the average of the normalized numbers of the mapped reads between the two 
groups. The identified GCNVs with more copies in the marine and freshwater groups are colored by orange and blue, respectively. Genes were 
excluded from visualization if the median of the numbers of mapped reads per 100 bp of the gene length was less than one or if no reads were 
mapped in at least one individual. The error bars indicate standard deviations of the ratios that were calculated for pairs of freshwater and marine 
groups derived from the same geographic regions. (If multiple samples were derived from the same geographic region for either group, the 
average of the normalized number of reads was used for the calculation). 



derivative, because increase, rather than decrease, in 
gene copy numbers is expected to facihtate adaptation 
to new environments by introducing new physiology 
and morphology to the organism [35]. For example, 
Chen et al, suggested that duplications of protein cod- 
ing genes contributed to the physiological fitness of 
Antarctic notothenioids in freezing polar conditions 
[18]. In particular, the freshwater-increased GCNVs in- 
cluded two genes involved in the inflammatory response 
{AP0L2, NLRC5) and two genes that were homologous 
to MyHC (ENSGACG00000002902, ENSGACGOOOOOOO 
2933). A previous study showed parallel divergences be- 
tween littoral and pelagic phenotype pairs of three- 
spined stickleback MHC genes, which are key genes in 
the immune system and would be associated with para- 
site communities in each habitat [36]. Various types of 
myosin genes were reported to have appeared during 
the evolution of teleost fish, and those variations were 
supposed to have contributed to the adaptation to vari- 
able aquatic conditions [33]. Thus, we expect that those 
GCNVs would have played important roles in adapta- 
tion to the freshwater environment. 



The larger gene copy numbers in the freshwater group 
could be due to the choice of the reference genome se- 
quence. We used the reference genome that was generated 
from a freshwater lineage, thus the mapping efficiency of 
the sequencing data of the marine group might be lower 
for genes that accumulated many SNVs between the mar- 
ine and freshwater groups. To examine whether the de- 
tected GCNVs were derived from the mapping efficiency 
bias toward the freshwater group, we investigated the fre- 
quencies of SNVs of the 19 freshwater- increased GCNVs 
using reads that were mapped with the '-e 100' option. 
The most divergent gene was ENSGACG00000015099, 
which contained an average of 1.02 SNVs per 1 kb along 
the gene body in the marine group. This frequency was 
insufficient to produce the observed differences in the 
numbers of mapped reads. Therefore, the mapping effi- 
ciency bias was unlikely to explain the large number of 
the freshwater-increased GCNVs. 

GCNVs likely due to duplications or multiplications 

To confirm whether the detected GCNVs under parallel 
selection were due to duplications or multiplications in 
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the freshwater group, we counted the numbers of differ- 
ent allelic sequences within the regions of the GCNVs 
(Figure IC). Two freshwater-increased GCNVs (ENSGA 
CG00000003408 and AP0L2) (Figures 4A and B) were 
strongly predicted to be such GCNVs, because they were 
supported by at least two within-read-length SNV pos- 
ition pairs in three individuals of the freshwater group 
(Table 1 and Additional file 2: Table S2). Read depths along 
the genomic coordinates were not stable probably due to 
sequencing biases, thus their differences were clearly 



ENSGACG00000003408 



300- 

I 

■o 200- 
1 

100- 
0- 




— Fresh — Marine 
7.994 mb 



7.9d5i1lb 



7.997 mO 



ENSGACGO000O0034O8 1 



B 



ENSGACG0000001455a 





250 


D. 


200 


{£ 




"O 


ISO 


X} 

to 

133 


100 


{£. 






50 




0 



I. 




— Fresh -*- Marine 
1 5.607 mb 15.6 09 mb 1 5.61 1 mb 1 5.61 3 mb 
15.668 mb 



I 



15.610 mb 



15.61 2 mb 



ENSGACG00000014553 | 

ENSGACG00000014556 Q 



-lO oo 



c 






600 


i 




« 


400 


ro 

0 




OC 


200 




0 



ENSGACG00000003374 




* ^ 



Fresh -*- Marine 
1,526 mb 



1.5g8mb 



1.527 mb 

ENSGACG00000003374 I Q Q G C 



ENSGACG00000003379 1 

Figure 4 Numbers of mapped reads in two fresiiwater-increased 
and one freshiwater-decreased GCNVs. Each point and line 
represent tine normalized numbers and average normalized numbers, 
respectively, of the mapped reads per 200-bp non-overlapping window 
for 10 freshwater (black) and 10 marine (red) individuals. (A and B) 
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observed in the regions with large read depths. It was 
notable that the read depths in the intronic regions of 
AP0L2 of the freshwater group were higher than those 
of the marine group (Figure 4B), suggesting that this 
gene was recently duplicated with their intronic se- 
quences. In addition, multiple copies of one freshwater- 
decreased GCNV (ENSGACG00000003374) (Figure 4C) 
were predicted to exist on the genomes of the marine 
group by the same analysis on the marine group. An- 
other freshwater-decreased GCNV (NXPE3) was also 
supported by at least one within-read-length SNV pos- 
ition pair in three individuals of the marine group 
(Table 1 and Additional file 2: Table S2). The copy num- 
bers of these two genes would have decreased during 
the adaptation to the freshwater environment. 

The AP0L2 gene is a member of the apolipoprotein L 
gene family. This gene family is involved in pathogen 
immunity and was previously reported to have been 
under positive selection in primates [37]. Another previ- 
ous study found copy number differences in the APOLl 
gene between human and chimpanzee and suggested that 
these differences were involved in the adaptive phenotype 
differentiation of the inflammatory response [30]. The du- 
plications or multiplications of AP0L2 might have contrib- 
uted to adaption of the immune system to the freshwater 
environment. For ENSGACG00000003408, we conducted 
BLASTX searches against NCBI nr database because no 
functional descriptions were available in the Ensembl data- 
base. The best hit for this gene was a neoverrucotoxin 
subunit alpha-like gene of Oreochromis niloticus with E- 
value = 0.0 (Accession numbers of the hits were XP_0034 
49498, XP_003449506, and XP_003449483). This gene 
was reported to be overexpressed in the brooding tissue 
of pregnant specimens of a species in genus Syngnathus 
[38], which belongs to the same order as the three- 
spined stickleback does. The duplications or multiplica- 
tions of ENSGACG00000003408 might have had roles in 
pregnancy functions in the freshwater environment. We 
could not obtain any hit for ENSGACG00000003374. A 
previous study reported GCNVs of NXPE3 within mar- 
ine populations [27]. NXPH3 is a neuropeptide-like 
molecule that functions in brain [39], and neuropep- 
tides were suggested to control migratory behaviors 
[40]. The decrease of the NXPE3 copy numbers in the 
freshwater group might have been associated with their 
anadromous behavior [22]. 

Differential expressions of genes between the two 
environments 

If the two strongly supported freshwater-increased GCNVs 
actually contributed to the parallel evolution of the three- 
spined sticklebacks, the amount of transcription products 
of these genes should be important for the adaptation. 
Thus, we analyzed microarray data of gills of three-spined 



Hirase et al. BMC Genomics 2014, 15:735 
http://www.bionnedcentral.conn/1471 -21 64/1 5/735 



Page 8 of 10 



sticklebacks in marine and freshwater groups under the 
short and long photoperiod conditions [41], and evaluated 
whether these two genes showed significant differential ex- 
pressions between the two groups. As expected, the gene 
expression values of AP0L2 and ENSGACG00000003408 
were higher in the freshwater group than those in the mar- 
ine group highly significantly (p < 0.005 after Bonferroni 
correction) under the short photoperiod condition. The 
short photoperiod condition resembled winter, thus these 
genes might have contributed to the fitness though the 
overwinter survival [42]. 

Conclusion 

In this study, we showed the possibility that GCNVs 
underwent positive selection in the parallel evolution of 
the three-spined sticklebacks and had a role in the adapta- 
tion to the freshwater environment. It would be notable 
that many CNVs were found in a marine population of 
three-spined sticklebacks [27], which suggests the exist- 
ence of globally shared, standing CNVs that can contribute 
to the parallel evolution within natural population. Our re- 
sults suggest that the contribution of GCNVs should be 
considered in studies on adaptive evolution of diverse 
species. 

Methods 

Genome sequences 

The three-spined stickleback genome sequence (BRO 
ADS 1.56) and the annotated gene models were taken 
from the Ensembl database (release 72, [43]). The gen- 
ome sequence has been generated from a line derived 
from a freshwater population (Bear Paw Lake, [26]). 

Resequencing data processing 

A resequencing dataset of 10 marine and 10 freshwater 
individuals was previously generated using an Illumina 
Genome Analyzer II (36—51 bp, single-end), which 
yielded approximately sixty million reads (approximately 
2.3 x) per individual (Jones et al [26], Additional file 1: 
Table SI). We downloaded the data from NCBI Se- 
quence Read Archive (SRA, [28]). The accession num- 
bers were SRX077979, SRX079119, SRX079120, 
SRX077981, SRX077982, SRX077990, SRX077978, 
SRX076627, SRX079121, SRX077983, SRX077984, 
SRX077986, SRX077980, SRX077988, SRX077989, 
SRX077987, SRX077991, SRX077992, SRX076626, 
SRX077985, SRX077993, and SRX077994. 

The sequenced reads from each individual were mapped 
to the stickleback genome using the Bowtie 0.12.8 soft- 
ware [44] (Figure lA). The Bowtie option of '-m V was 
adopted to remove reads with multiple hits. In addition, to 
obtain reliable GCNVs that were not affected by the map- 
ping parameter selection, we adopted three different values 
(70, 100, and 130) for the '-e' option, which designated the 



maximum permitted total quality values at all mismatched 
positions throughout a read alignment. To avoid the effects 
of potential PCR duplicates, if multiple reads were aligned 
to the same position, all of the reads except for those with 
the highest mapping quality were removed using SAM- 
tools (version 0.1.18, [45]) with the command samtools 
rmdup -s'. The statistics for each mapping option are 
shown in Additional file 1: Table SI. 

Identification of GCNVs likely under parallel selection 

We compared the numbers of mapped reads for each gene 
between the freshwater and marine groups to identify 
GCNVs under parallel selection (Figure IB). If the num- 
bers of mapped reads were significantly larger in the fresh- 
water group, the gene would have been duplicated or 
multiplied specifically in the genomes of the freshwater 
group. If the numbers were significantly smaller, the gene 
would have been deleted or its copy number would have 
decreased. 

The most 5'- and 3'- positions of each gene were re- 
trieved from the Ensembl annotation, and the numbers of 
mapped reads that overlapped with the above area (i.e., 
any exonic or intronic region) were counted using the 
IntersectBed' command in bedtools [46]. Because insuffi- 
cient numbers of mapped reads may result in the detection 
of false GCNVs, we removed genes from the subsequent 
analysis if the median of the numbers of the mapped reads 
per 100 bp of the gene lengths was less than one, or if no 
reads were mapped in at least one individual resequencing 
data. For normalization, the numbers were divided by the 
total number of mapped reads across the genome for each 
individual. Then, we searched for GCNVs under parallel se- 
lection by detecting genes that showed significant differ- 
ences in the normalized read numbers between the 
freshwater and marine groups using the edgeR package 
[47] with a false discovery rate (FDR) < 0.05. We regarded 
genes that were significant under all of the three different 
mapping options ("-e 70", "-e 100", and "-e 130") as GCNVs 
likely under parallel selection. 

To confirm that the number of identified GCNVs under 
parallel selection was significantly larger than that ex- 
pected by chance (i.e., by genetic drift), we calculated an 
empirical p value based on a permutation test. We ran- 
domly reallocated the 10 freshwater and 10 marine indi- 
viduals into two groups 10,000 times, performed the same 
analyses, and obtained the null distribution of numbers 
of GCNVs. 

Identification of gene duplications or multiplications 

If the identified GCNVs involved gene duplications or 
multiplications, three or more different allelic sequences 
should be observed within the gene in each individual of 
each group, because three or more different allelic se- 
quences cannot originate from a diploid genome. Thus, 
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we examined whether three or more different alleUc 
sequences were observed in the identified GCNVs 
(Figure IC). 

For each of the identified GCNVs, SNVs were called 
by applying the SAMtools/BCFtools pipeline [45] to the 
reads that were mapped with the '-e 100' option. The 
SAMtools/BCFtools pipeline was used with default pa- 
rameters, except for the 30' option, to consider bases 
that were called with high quality only. We enumerated 
every pair of SNV positions that was located within the 
read length, i.e., 36 bp {within-read-length SNV position 
pairs). The numbers of different nucleotide pairs for 
each of the within-read-length SNV position pairs were 
counted, where each nucleotide pair was supported by 
multiple reads. Finally, we selected GCNVs that showed 
three or more different nucleotide pairs in at least three 
individuals of either group. 

Gene annotations 

For each GCNV likely under parallel selection, we ob- 
tained functional annotations of the gene from the 
Ensembl database. If the functional annotations were un- 
available, BLASTX searches [48] against the NCBI non- 
redundant protein database (nr) [49] were conducted with 
an E-value cutoff of le-14, and the hit with the highest 
bit-score and its annotated protein name was retrieved. 

Microarray data analysis 

Microarray data of gills of two families of pure marine 
and pure freshwater crosses under short and long photo- 
periods [41] were downloaded from Center for Informa- 
tion Biology Gene Expression (http://cibex.nig.ac.jp) with 
the accession number CBX139. Two marine and fresh- 
water datasets were treated as biological replicates. If 
multiple probes were mapped to one transcript, the 
median signal intensity of these probes was used. After 
removing intra-gene probes, genes with significant 
expression-value differences between the marine and 
freshwater groups were identified using the eBayes 
method in the limma package [50]. 

Additional files 



Additional file 1: Table 51. Summary of the resequencing datasets of 
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